Ola-7B is a multimodal language model jointly developed by Tencent, Tsinghua University, and Nanyang Technological University, based on the Qwen2.5 architecture. It supports processing image, video, audio, and text inputs and outputs text.
Multimodal Fusion
Safetensors Supports Multiple Languages